Analytics on GeodataFrames - COVID 19 CASEΒΆ
In the selected case, we will focus on positive cases, centering on the population most vulnerable to COVID-19, which includes middle-aged adults (40-59 years) and older adults (60+ years).
First, we read the data stored in Google Drive.
import pandas as pd
# Lee el archivo especificando el delimitador como ";"
covid19 = pd.read_csv("positivos_covid.csv", delimiter=';')
covid19.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4585360 entries, 0 to 4585359 Data columns (total 10 columns): # Column Dtype --- ------ ----- 0 FECHA_CORTE int64 1 DEPARTAMENTO object 2 PROVINCIA object 3 DISTRITO object 4 METODODX object 5 EDAD float64 6 SEXO object 7 FECHA_RESULTADO float64 8 UBIGEO float64 9 id_persona float64 dtypes: float64(4), int64(1), object(5) memory usage: 349.8+ MB
#check
covid19.head()
| FECHA_CORTE | DEPARTAMENTO | PROVINCIA | DISTRITO | METODODX | EDAD | SEXO | FECHA_RESULTADO | UBIGEO | id_persona | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 20241203 | TUMBES | TUMBES | TUMBES | AG | 46.0 | FEMENINO | 20221207.0 | 240101.0 | 203499.0 |
| 1 | 20241203 | LIMA | LIMA | JESUS MARIA | AG | 69.0 | FEMENINO | 20230822.0 | 150113.0 | 221397.0 |
| 2 | 20241203 | SAN MARTIN | MOYOBAMBA | MOYOBAMBA | AG | 55.0 | FEMENINO | 20240108.0 | 220101.0 | 295651.0 |
| 3 | 20241203 | AREQUIPA | CAYLLOMA | COPORAQUE | AG | 50.0 | MASCULINO | 20230824.0 | 40506.0 | 851625.0 |
| 4 | 20241203 | LIMA | LIMA | JESUS MARIA | AG | 58.0 | MASCULINO | 20221217.0 | 150113.0 | 287786.0 |
We begin data cleaningΒΆ
covid19 = covid19.drop(columns=['FECHA_CORTE', 'METODODX', 'id_persona'])
#check
covid19.head()
| DEPARTAMENTO | PROVINCIA | DISTRITO | EDAD | SEXO | FECHA_RESULTADO | UBIGEO | |
|---|---|---|---|---|---|---|---|
| 0 | TUMBES | TUMBES | TUMBES | 46.0 | FEMENINO | 20221207.0 | 240101.0 |
| 1 | LIMA | LIMA | JESUS MARIA | 69.0 | FEMENINO | 20230822.0 | 150113.0 |
| 2 | SAN MARTIN | MOYOBAMBA | MOYOBAMBA | 55.0 | FEMENINO | 20240108.0 | 220101.0 |
| 3 | AREQUIPA | CAYLLOMA | COPORAQUE | 50.0 | MASCULINO | 20230824.0 | 40506.0 |
| 4 | LIMA | LIMA | JESUS MARIA | 58.0 | MASCULINO | 20221217.0 | 150113.0 |
# Extraemos solo el aΓ±o de la columna FECHA_RESULTADO en el DataFrame covid19
covid19['FECHA_RESULTADO'] = covid19['FECHA_RESULTADO'].astype(str).str[:4]
# Eliminar filas con NaN en EDAD
covid19 = covid19.dropna(subset=['EDAD'])
covid19['EDAD'] = covid19['EDAD'].astype(int)
# Convertimos la columna EDAD a enteros para remover el ".0"
covid19['EDAD'] = covid19['EDAD'].astype(int)
#check
covid19.head()
| DEPARTAMENTO | PROVINCIA | DISTRITO | EDAD | SEXO | FECHA_RESULTADO | UBIGEO | |
|---|---|---|---|---|---|---|---|
| 0 | TUMBES | TUMBES | TUMBES | 46 | FEMENINO | 2022 | 240101.0 |
| 1 | LIMA | LIMA | JESUS MARIA | 69 | FEMENINO | 2023 | 150113.0 |
| 2 | SAN MARTIN | MOYOBAMBA | MOYOBAMBA | 55 | FEMENINO | 2024 | 220101.0 |
| 3 | AREQUIPA | CAYLLOMA | COPORAQUE | 50 | MASCULINO | 2023 | 40506.0 |
| 4 | LIMA | LIMA | JESUS MARIA | 58 | MASCULINO | 2022 | 150113.0 |
# years in data
covid19.FECHA_RESULTADO.value_counts()
FECHA_RESULTADO 2022 2132009 2021 1307581 2020 1022565 2023 93361 2024 27074 nan 2023 1899 394 Name: count, dtype: int64
# Primero eliminamos los valores NaN de la columna FECHA_RESULTADO y luego filtramos los valores no deseados como '1899'
covid19 = covid19[~covid19['FECHA_RESULTADO'].isin(['nan'])]
# Convertimos FECHA_RESULTADO a string por seguridad y filtramos los valores no deseados
covid19 = covid19[~covid19['FECHA_RESULTADO'].isin(['1899'])]
# Verificamos que tenemos la periodizaciΓ³n correcta
covid19.FECHA_RESULTADO.value_counts()
FECHA_RESULTADO 2022 2132009 2021 1307581 2020 1022565 2023 93361 2024 27074 Name: count, dtype: int64
# Mostramos los valores mΓnimo y mΓ‘ximo en la columna 'EDAD' del DataFrame, para verificar que estΓ‘ todo ok
edad_min = covid19['EDAD'].min()
edad_max = covid19['EDAD'].max()
edad_min, edad_max
(0, 125)
# Creamos una nueva columna 'Grupo_Edad' en el DataFrame covid19 con las categorΓas de edad especificadas
covid19['Grupo_Edad'] = pd.cut(
covid19['EDAD'],
bins=[0, 17, 39, 59, float('inf')],
labels=["NiΓ±os y adolescentes (0-17 aΓ±os)", "Adultos jΓ³venes (18-39 aΓ±os)", "Adultos de mediana edad (40-59 aΓ±os)", "Personas mayores (60+ aΓ±os)"]
)
covid19.head()
| DEPARTAMENTO | PROVINCIA | DISTRITO | EDAD | SEXO | FECHA_RESULTADO | UBIGEO | Grupo_Edad | |
|---|---|---|---|---|---|---|---|---|
| 0 | TUMBES | TUMBES | TUMBES | 46 | FEMENINO | 2022 | 240101.0 | Adultos de mediana edad (40-59 aΓ±os) |
| 1 | LIMA | LIMA | JESUS MARIA | 69 | FEMENINO | 2023 | 150113.0 | Personas mayores (60+ aΓ±os) |
| 2 | SAN MARTIN | MOYOBAMBA | MOYOBAMBA | 55 | FEMENINO | 2024 | 220101.0 | Adultos de mediana edad (40-59 aΓ±os) |
| 3 | AREQUIPA | CAYLLOMA | COPORAQUE | 50 | MASCULINO | 2023 | 40506.0 | Adultos de mediana edad (40-59 aΓ±os) |
| 4 | LIMA | LIMA | JESUS MARIA | 58 | MASCULINO | 2022 | 150113.0 | Adultos de mediana edad (40-59 aΓ±os) |
covid19.Grupo_Edad.value_counts()
Grupo_Edad Adultos jΓ³venes (18-39 aΓ±os) 2044590 Adultos de mediana edad (40-59 aΓ±os) 1485974 Personas mayores (60+ aΓ±os) 726577 NiΓ±os y adolescentes (0-17 aΓ±os) 308273 Name: count, dtype: int64
# Filtrar el DataFrame para excluir los grupos etarios especificados
covid19_vulnerables = covid19[~covid19['Grupo_Edad'].isin(["NiΓ±os y adolescentes (0-17 aΓ±os)", "Adultos jΓ³venes (18-39 aΓ±os)"])]
covid19_vulnerables.Grupo_Edad.value_counts()
Grupo_Edad Adultos de mediana edad (40-59 aΓ±os) 1485974 Personas mayores (60+ aΓ±os) 726577 NiΓ±os y adolescentes (0-17 aΓ±os) 0 Adultos jΓ³venes (18-39 aΓ±os) 0 Name: count, dtype: int64
Reshaping to LongΒΆ
We keep only the two most vulnerable groups, People per level, by distrit by year:
indexList=['FECHA_RESULTADO','DEPARTAMENTO','PROVINCIA','Grupo_Edad']
aggregator={'Grupo_Edad':[len]}
covid19_vulnerables=covid19_vulnerables.groupby(indexList,observed=True).agg(aggregator)
covid19_vulnerables
| Grupo_Edad | ||||
|---|---|---|---|---|
| len | ||||
| FECHA_RESULTADO | DEPARTAMENTO | PROVINCIA | Grupo_Edad | |
| 2020 | AMAZONAS | BAGUA | Adultos de mediana edad (40-59 aΓ±os) | 2580 |
| Personas mayores (60+ aΓ±os) | 1521 | |||
| BONGARA | Adultos de mediana edad (40-59 aΓ±os) | 129 | ||
| Personas mayores (60+ aΓ±os) | 69 | |||
| CHACHAPOYAS | Adultos de mediana edad (40-59 aΓ±os) | 696 | ||
| ... | ... | ... | ... | ... |
| 2024 | TUMBES | ZARUMILLA | Adultos de mediana edad (40-59 aΓ±os) | 5 |
| Personas mayores (60+ aΓ±os) | 4 | |||
| UCAYALI | CORONEL PORTILLO | Adultos de mediana edad (40-59 aΓ±os) | 38 | |
| Personas mayores (60+ aΓ±os) | 19 | |||
| PADRE ABAD | Adultos de mediana edad (40-59 aΓ±os) | 2 |
2039 rows Γ 1 columns
Sending the counts to wide columns:
Covid19Draft=covid19_vulnerables.unstack(3).fillna(0) #leftmost index in rows
Covid19Draft
| Grupo_Edad | ||||
|---|---|---|---|---|
| len | ||||
| Grupo_Edad | Adultos de mediana edad (40-59 aΓ±os) | Personas mayores (60+ aΓ±os) | ||
| FECHA_RESULTADO | DEPARTAMENTO | PROVINCIA | ||
| 2020 | AMAZONAS | BAGUA | 2580.0 | 1521.0 |
| BONGARA | 129.0 | 69.0 | ||
| CHACHAPOYAS | 696.0 | 262.0 | ||
| CONDORCANQUI | 922.0 | 288.0 | ||
| EN INVESTIGACIΓN | 17.0 | 18.0 | ||
| ... | ... | ... | ... | ... |
| 2024 | TUMBES | CONTRALMIRANTE VILLAR | 0.0 | 4.0 |
| TUMBES | 17.0 | 15.0 | ||
| ZARUMILLA | 5.0 | 4.0 | ||
| UCAYALI | CORONEL PORTILLO | 38.0 | 19.0 | |
| PADRE ABAD | 2.0 | 0.0 | ||
1050 rows Γ 2 columns
Covid19Draft['ALARMA_pct']=Covid19Draft.iloc[:,1]/(Covid19Draft.iloc[:,0] + Covid19Draft.iloc[:,1])
covid19_vulnerables_Alarm_w=Covid19Draft['ALARMA_pct'].unstack('FECHA_RESULTADO').fillna(0)
covid19_vulnerables_Alarm_w
| FECHA_RESULTADO | 2020 | 2021 | 2022 | 2023 | 2024 | |
|---|---|---|---|---|---|---|
| DEPARTAMENTO | PROVINCIA | |||||
| AMAZONAS | BAGUA | 0.370885 | 0.391144 | 0.339266 | 0.533333 | 0.458333 |
| BONGARA | 0.348485 | 0.363825 | 0.305233 | 0.500000 | 0.600000 | |
| CHACHAPOYAS | 0.273486 | 0.321394 | 0.268201 | 0.417476 | 0.440860 | |
| CONDORCANQUI | 0.238017 | 0.339367 | 0.205714 | 0.000000 | 0.000000 | |
| EN INVESTIGACIΓN | 0.514286 | 0.392857 | 0.458333 | 0.333333 | 0.000000 | |
| ... | ... | ... | ... | ... | ... | ... |
| UCAYALI | ATALAYA | 0.325243 | 0.241379 | 0.344828 | 0.000000 | 0.000000 |
| CORONEL PORTILLO | 0.387321 | 0.342441 | 0.328023 | 0.404255 | 0.333333 | |
| EN INVESTIGACIΓN | 0.335516 | 0.375000 | 0.255208 | 0.500000 | 0.000000 | |
| PADRE ABAD | 0.309686 | 0.332174 | 0.279487 | 0.071429 | 0.000000 | |
| PURUS | 0.224599 | 0.300000 | 0.172414 | 0.000000 | 0.000000 |
221 rows Γ 5 columns
Notice the data type:
covid19_vulnerables_Alarm_w.columns
Index(['2020', '2021', '2022', '2023', '2024'], dtype='object', name='FECHA_RESULTADO')
We should have text not numbers:
covid19_vulnerables_Alarm_w.columns=['year'+str(x) for x in covid19_vulnerables_Alarm_w.columns]
#then
covid19_vulnerables_Alarm_w
| year2020 | year2021 | year2022 | year2023 | year2024 | ||
|---|---|---|---|---|---|---|
| DEPARTAMENTO | PROVINCIA | |||||
| AMAZONAS | BAGUA | 0.370885 | 0.391144 | 0.339266 | 0.533333 | 0.458333 |
| BONGARA | 0.348485 | 0.363825 | 0.305233 | 0.500000 | 0.600000 | |
| CHACHAPOYAS | 0.273486 | 0.321394 | 0.268201 | 0.417476 | 0.440860 | |
| CONDORCANQUI | 0.238017 | 0.339367 | 0.205714 | 0.000000 | 0.000000 | |
| EN INVESTIGACIΓN | 0.514286 | 0.392857 | 0.458333 | 0.333333 | 0.000000 | |
| ... | ... | ... | ... | ... | ... | ... |
| UCAYALI | ATALAYA | 0.325243 | 0.241379 | 0.344828 | 0.000000 | 0.000000 |
| CORONEL PORTILLO | 0.387321 | 0.342441 | 0.328023 | 0.404255 | 0.333333 | |
| EN INVESTIGACIΓN | 0.335516 | 0.375000 | 0.255208 | 0.500000 | 0.000000 | |
| PADRE ABAD | 0.309686 | 0.332174 | 0.279487 | 0.071429 | 0.000000 | |
| PURUS | 0.224599 | 0.300000 | 0.172414 | 0.000000 | 0.000000 |
221 rows Γ 5 columns
# as usual
covid19_vulnerables_Alarm_w.reset_index(inplace=True)
covid19_vulnerables_Alarm_w
| DEPARTAMENTO | PROVINCIA | year2020 | year2021 | year2022 | year2023 | year2024 | |
|---|---|---|---|---|---|---|---|
| 0 | AMAZONAS | BAGUA | 0.370885 | 0.391144 | 0.339266 | 0.533333 | 0.458333 |
| 1 | AMAZONAS | BONGARA | 0.348485 | 0.363825 | 0.305233 | 0.500000 | 0.600000 |
| 2 | AMAZONAS | CHACHAPOYAS | 0.273486 | 0.321394 | 0.268201 | 0.417476 | 0.440860 |
| 3 | AMAZONAS | CONDORCANQUI | 0.238017 | 0.339367 | 0.205714 | 0.000000 | 0.000000 |
| 4 | AMAZONAS | EN INVESTIGACIΓN | 0.514286 | 0.392857 | 0.458333 | 0.333333 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 216 | UCAYALI | ATALAYA | 0.325243 | 0.241379 | 0.344828 | 0.000000 | 0.000000 |
| 217 | UCAYALI | CORONEL PORTILLO | 0.387321 | 0.342441 | 0.328023 | 0.404255 | 0.333333 |
| 218 | UCAYALI | EN INVESTIGACIΓN | 0.335516 | 0.375000 | 0.255208 | 0.500000 | 0.000000 |
| 219 | UCAYALI | PADRE ABAD | 0.309686 | 0.332174 | 0.279487 | 0.071429 | 0.000000 |
| 220 | UCAYALI | PURUS | 0.224599 | 0.300000 | 0.172414 | 0.000000 | 0.000000 |
221 rows Γ 7 columns
!pip install geopandas
Requirement already satisfied: geopandas in c:\users\luis\anaconda3\lib\site-packages (0.14.2) Requirement already satisfied: fiona>=1.8.21 in c:\users\luis\anaconda3\lib\site-packages (from geopandas) (1.9.5) Requirement already satisfied: packaging in c:\users\luis\anaconda3\lib\site-packages (from geopandas) (23.2) Requirement already satisfied: pandas>=1.4.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas) (2.2.2) Requirement already satisfied: pyproj>=3.3.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas) (3.6.1) Requirement already satisfied: shapely>=1.8.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas) (2.0.5) Requirement already satisfied: attrs>=19.2.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (23.1.0) Requirement already satisfied: certifi in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (2024.8.30) Requirement already satisfied: click~=8.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (8.1.7) Requirement already satisfied: click-plugins>=1.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (1.1.1) Requirement already satisfied: cligj>=0.5 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (0.7.2) Requirement already satisfied: six in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (1.16.0) Requirement already satisfied: setuptools in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas) (69.5.1) Requirement already satisfied: numpy>=1.26.0 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (1.26.4) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3) Requirement already satisfied: colorama in c:\users\luis\anaconda3\lib\site-packages (from click~=8.0->fiona>=1.8.21->geopandas) (0.4.6)
Let's call a map:
mapLink='https://github.com/SocialAnalytics-StrategicIntelligence/GeoDF_Analytics/raw/main/maps/ProvsINEI2023.zip'
import geopandas as gpd
provmap=gpd.read_file(mapLink)
provmap.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 196 entries, 0 to 195 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 OBJECTID 196 non-null float64 1 CCDD 196 non-null object 2 CCPP 196 non-null object 3 DEPARTAMEN 196 non-null object 4 PROVINCIA 196 non-null object 5 geometry 196 non-null geometry dtypes: float64(1), geometry(1), object(4) memory usage: 9.3+ KB
Let me create a column, concatenating two:
provmap['location']=['+'.join(x[0]) for x in zip(provmap.iloc[:,3:5].values)]
provmap.head(10)
| OBJECTID | CCDD | CCPP | DEPARTAMEN | PROVINCIA | geometry | location | |
|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 01 | 01 | AMAZONAS | CHACHAPOYAS | POLYGON ((-77.72614 -5.94354, -77.72486 -5.943... | AMAZONAS+CHACHAPOYAS |
| 1 | 2.0 | 01 | 02 | AMAZONAS | BAGUA | POLYGON ((-78.61909 -4.51001, -78.61802 -4.510... | AMAZONAS+BAGUA |
| 2 | 3.0 | 01 | 03 | AMAZONAS | BONGARA | POLYGON ((-77.72759 -5.14030, -77.72361 -5.140... | AMAZONAS+BONGARA |
| 3 | 4.0 | 01 | 04 | AMAZONAS | CONDORCANQUI | POLYGON ((-77.81399 -2.99278, -77.81483 -2.995... | AMAZONAS+CONDORCANQUI |
| 4 | 5.0 | 01 | 05 | AMAZONAS | LUYA | POLYGON ((-78.13023 -5.90370, -78.13011 -5.904... | AMAZONAS+LUYA |
| 5 | 6.0 | 01 | 06 | AMAZONAS | RODRIGUEZ DE MENDOZA | POLYGON ((-77.44452 -6.05002, -77.44387 -6.050... | AMAZONAS+RODRIGUEZ DE MENDOZA |
| 6 | 7.0 | 01 | 07 | AMAZONAS | UTCUBAMBA | POLYGON ((-78.09288 -5.36258, -78.09288 -5.364... | AMAZONAS+UTCUBAMBA |
| 7 | 8.0 | 02 | 01 | ANCASH | HUARAZ | POLYGON ((-77.39870 -9.35563, -77.39852 -9.356... | ANCASH+HUARAZ |
| 8 | 9.0 | 02 | 02 | ANCASH | AIJA | POLYGON ((-77.61368 -9.64900, -77.61241 -9.649... | ANCASH+AIJA |
| 9 | 10.0 | 02 | 03 | ANCASH | ANTONIO RAYMONDI | POLYGON ((-77.08856 -8.97496, -77.08804 -8.975... | ANCASH+ANTONIO RAYMONDI |
I will do the same with the data frame:
covid19_vulnerables_Alarm_w['location']=['+'.join(x[0]) for x in zip(covid19_vulnerables_Alarm_w.iloc[:,:2].values)]
covid19_vulnerables_Alarm_w.head()
| DEPARTAMENTO | PROVINCIA | year2020 | year2021 | year2022 | year2023 | year2024 | location | |
|---|---|---|---|---|---|---|---|---|
| 0 | AMAZONAS | BAGUA | 0.370885 | 0.391144 | 0.339266 | 0.533333 | 0.458333 | AMAZONAS+BAGUA |
| 1 | AMAZONAS | BONGARA | 0.348485 | 0.363825 | 0.305233 | 0.500000 | 0.600000 | AMAZONAS+BONGARA |
| 2 | AMAZONAS | CHACHAPOYAS | 0.273486 | 0.321394 | 0.268201 | 0.417476 | 0.440860 | AMAZONAS+CHACHAPOYAS |
| 3 | AMAZONAS | CONDORCANQUI | 0.238017 | 0.339367 | 0.205714 | 0.000000 | 0.000000 | AMAZONAS+CONDORCANQUI |
| 4 | AMAZONAS | EN INVESTIGACIΓN | 0.514286 | 0.392857 | 0.458333 | 0.333333 | 0.000000 | AMAZONAS+EN INVESTIGACIΓN |
PreprocessingΒΆ
The names from non-english speaking countries may come with some symbols that may cause trouble (', ~). Let's get rid of those:
!pip install unidecode
Requirement already satisfied: unidecode in c:\users\luis\anaconda3\lib\site-packages (1.2.0)
import unidecode
byePunctuation=lambda x: unidecode.unidecode(x)
covid19_vulnerables_Alarm_w['location']=covid19_vulnerables_Alarm_w['location'].apply(byePunctuation)
provmap['location']=provmap['location'].apply(byePunctuation)
# replacing dashes and multiple spaces by a simple space
covid19_vulnerables_Alarm_w['location']=covid19_vulnerables_Alarm_w.location.str.replace("\-|\_|\s+","",regex=True)
provmap['location']=provmap.location.str.replace("\-|\_|\s+","",regex=True)
<>:2: SyntaxWarning: invalid escape sequence '\-'
<>:3: SyntaxWarning: invalid escape sequence '\-'
<>:2: SyntaxWarning: invalid escape sequence '\-'
<>:3: SyntaxWarning: invalid escape sequence '\-'
C:\Users\Luis\AppData\Local\Temp\ipykernel_12156\1514654713.py:2: SyntaxWarning: invalid escape sequence '\-'
covid19_vulnerables_Alarm_w['location']=covid19_vulnerables_Alarm_w.location.str.replace("\-|\_|\s+","",regex=True)
C:\Users\Luis\AppData\Local\Temp\ipykernel_12156\1514654713.py:3: SyntaxWarning: invalid escape sequence '\-'
provmap['location']=provmap.location.str.replace("\-|\_|\s+","",regex=True)
MergingΒΆ
We need to merge both tables now. That can happen effectively if both tables have a key column: a column (or collection of them) whose values in one table are the same in the other one.
The match need not be exact, but only common values in the key are merged.
Let's find out what is NOT matched in each table:
nomatch_df=set(covid19_vulnerables_Alarm_w.location)- set(provmap.location)
nomatch_gdf=set(provmap.location)-set(covid19_vulnerables_Alarm_w.location)
This is what could not be matched:
len(nomatch_df), len(nomatch_gdf)
(27, 2)
The right way to go is using fuzzy merging (remember we need the fuzz):
!pip install thefuzz
Requirement already satisfied: thefuzz in c:\users\luis\anaconda3\lib\site-packages (0.22.1) Requirement already satisfied: rapidfuzz<4.0.0,>=3.0.0 in c:\users\luis\anaconda3\lib\site-packages (from thefuzz) (3.10.1)
# pick the closest match from nomatch_gdf for a value in nomatch_df
from thefuzz import process
[(dis,process.extractOne(dis,nomatch_gdf)) for dis in sorted(nomatch_df)]
[('AMAZONAS+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 48)),
('ANCASH+ANTONIORAIMONDI', ('ANCASH+ANTONIORAYMONDI', 95)),
('ANCASH+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 59)),
('APURIMAC+ENINVESTIGACION', ('ICA+NASCA', 40)),
('AREQUIPA+ENINVESTIGACION', ('ICA+NASCA', 40)),
('AYACUCHO+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 43)),
('CAJAMARCA+ENINVESTIGACION', ('ICA+NASCA', 50)),
('CALLAO+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 41)),
('CUSCO+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 42)),
('HUANCAVELICA+ENINVESTIGACION', ('ICA+NASCA', 50)),
('HUANUCO+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 44)),
('ICA+ENINVESTIGACION', ('ICA+NASCA', 86)),
('ICA+NAZCA', ('ICA+NASCA', 89)),
('JUNIN+ENINVESTIGACION', ('ICA+NASCA', 40)),
('LALIBERTAD+ENINVESTIGACION', ('ICA+NASCA', 40)),
('LAMBAYEQUE+ENINVESTIGACION', ('ICA+NASCA', 40)),
('LIMA+ENINVESTIGACION', ('ICA+NASCA', 45)),
('LORETO+ENINVESTIGACION', ('ICA+NASCA', 40)),
('MADREDEDIOS+ENINVESTIGACION', ('ICA+NASCA', 40)),
('MOQUEGUA+ENINVESTIGACION', ('ICA+NASCA', 40)),
('PASCO+ENINVESTIGACION', ('ICA+NASCA', 48)),
('PIURA+ENINVESTIGACION', ('ICA+NASCA', 42)),
('PUNO+ENINVESTIGACION', ('ICA+NASCA', 40)),
('SANMARTIN+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 43)),
('TACNA+ENINVESTIGACION', ('ICA+NASCA', 48)),
('TUMBES+ENINVESTIGACION', ('ICA+NASCA', 40)),
('UCAYALI+ENINVESTIGACION', ('ANCASH+ANTONIORAYMONDI', 40))]
If you are comfortable, you prepare a dictionary of changes:
# is this OK?
{dis:process.extractOne(dis,nomatch_gdf)[0] for dis in sorted(nomatch_df)}
{'AMAZONAS+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'ANCASH+ANTONIORAIMONDI': 'ANCASH+ANTONIORAYMONDI',
'ANCASH+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'APURIMAC+ENINVESTIGACION': 'ICA+NASCA',
'AREQUIPA+ENINVESTIGACION': 'ICA+NASCA',
'AYACUCHO+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'CAJAMARCA+ENINVESTIGACION': 'ICA+NASCA',
'CALLAO+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'CUSCO+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'HUANCAVELICA+ENINVESTIGACION': 'ICA+NASCA',
'HUANUCO+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'ICA+ENINVESTIGACION': 'ICA+NASCA',
'ICA+NAZCA': 'ICA+NASCA',
'JUNIN+ENINVESTIGACION': 'ICA+NASCA',
'LALIBERTAD+ENINVESTIGACION': 'ICA+NASCA',
'LAMBAYEQUE+ENINVESTIGACION': 'ICA+NASCA',
'LIMA+ENINVESTIGACION': 'ICA+NASCA',
'LORETO+ENINVESTIGACION': 'ICA+NASCA',
'MADREDEDIOS+ENINVESTIGACION': 'ICA+NASCA',
'MOQUEGUA+ENINVESTIGACION': 'ICA+NASCA',
'PASCO+ENINVESTIGACION': 'ICA+NASCA',
'PIURA+ENINVESTIGACION': 'ICA+NASCA',
'PUNO+ENINVESTIGACION': 'ICA+NASCA',
'SANMARTIN+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI',
'TACNA+ENINVESTIGACION': 'ICA+NASCA',
'TUMBES+ENINVESTIGACION': 'ICA+NASCA',
'UCAYALI+ENINVESTIGACION': 'ANCASH+ANTONIORAYMONDI'}
# then:
changesinDF={dis:process.extractOne(dis,nomatch_gdf)[0] for dis in sorted(nomatch_df)}
Now, make the replacements:
covid19_vulnerables_Alarm_w.replace({'location': changesinDF}, inplace=True)
Is it over?
nomatch_df=set(covid19_vulnerables_Alarm_w.location)- set(provmap.location)
nomatch_gdf=set(provmap.location)-set(covid19_vulnerables_Alarm_w.location)
[(dis,process.extractOne(dis,nomatch_gdf)) for dis in sorted(nomatch_df)]
[]
Now the merge can happen:
covid19_vulnerables_Alarm_map=provmap.merge(covid19_vulnerables_Alarm_w, on='location',how='left',indicator='flag')
# check
covid19_vulnerables_Alarm_map.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 221 entries, 0 to 220 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 OBJECTID 221 non-null float64 1 CCDD 221 non-null object 2 CCPP 221 non-null object 3 DEPARTAMEN 221 non-null object 4 PROVINCIA_x 221 non-null object 5 geometry 221 non-null geometry 6 location 221 non-null object 7 DEPARTAMENTO 221 non-null object 8 PROVINCIA_y 221 non-null object 9 year2020 221 non-null float64 10 year2021 221 non-null float64 11 year2022 221 non-null float64 12 year2023 221 non-null float64 13 year2024 221 non-null float64 14 flag 221 non-null category dtypes: category(1), float64(6), geometry(1), object(7) memory usage: 24.6+ KB
# avoid poblems with fillna()
covid19_vulnerables_Alarm_map['flag']=covid19_vulnerables_Alarm_map.flag.astype(str)
We can get rid of some columns:
covid19_vulnerables_Alarm_map.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 221 entries, 0 to 220 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 OBJECTID 221 non-null float64 1 CCDD 221 non-null object 2 CCPP 221 non-null object 3 DEPARTAMEN 221 non-null object 4 PROVINCIA_x 221 non-null object 5 geometry 221 non-null geometry 6 location 221 non-null object 7 DEPARTAMENTO 221 non-null object 8 PROVINCIA_y 221 non-null object 9 year2020 221 non-null float64 10 year2021 221 non-null float64 11 year2022 221 non-null float64 12 year2023 221 non-null float64 13 year2024 221 non-null float64 14 flag 221 non-null object dtypes: float64(6), geometry(1), object(8) memory usage: 26.0+ KB
bye=['DEPARTAMENTO', 'CCPP','CCDD']
covid19_vulnerables_Alarm_map.drop(columns=bye,inplace=True)
# keeping
covid19_vulnerables_Alarm_map.head()
| OBJECTID | DEPARTAMEN | PROVINCIA_x | geometry | location | PROVINCIA_y | year2020 | year2021 | year2022 | year2023 | year2024 | flag | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | AMAZONAS | CHACHAPOYAS | POLYGON ((-77.72614 -5.94354, -77.72486 -5.943... | AMAZONAS+CHACHAPOYAS | CHACHAPOYAS | 0.273486 | 0.321394 | 0.268201 | 0.417476 | 0.440860 | both |
| 1 | 2.0 | AMAZONAS | BAGUA | POLYGON ((-78.61909 -4.51001, -78.61802 -4.510... | AMAZONAS+BAGUA | BAGUA | 0.370885 | 0.391144 | 0.339266 | 0.533333 | 0.458333 | both |
| 2 | 3.0 | AMAZONAS | BONGARA | POLYGON ((-77.72759 -5.14030, -77.72361 -5.140... | AMAZONAS+BONGARA | BONGARA | 0.348485 | 0.363825 | 0.305233 | 0.500000 | 0.600000 | both |
| 3 | 4.0 | AMAZONAS | CONDORCANQUI | POLYGON ((-77.81399 -2.99278, -77.81483 -2.995... | AMAZONAS+CONDORCANQUI | CONDORCANQUI | 0.238017 | 0.339367 | 0.205714 | 0.000000 | 0.000000 | both |
| 4 | 5.0 | AMAZONAS | LUYA | POLYGON ((-78.13023 -5.90370, -78.13011 -5.904... | AMAZONAS+LUYA | LUYA | 0.383117 | 0.368317 | 0.309783 | 0.346154 | 0.400000 | both |
# filling with zeroes
covid19_vulnerables_Alarm_map.fillna(0,inplace=True)
We can save this geoDF:
import os
covid19_vulnerables_Alarm_map.to_file(
os.path.join('C:\\Users\\Luis\\Documents\\GitHub\\covid_19', "provinciasPeru.gpkg"),
layer='provinciasCovid19',
driver="GPKG"
)
Exploring one variableΒΆ
This time, we explore statistically one variable in the map:
# statistics
covid19_vulnerables_Alarm_map.year2022.describe()
count 221.000000 mean 0.324225 std 0.067366 min 0.000000 25% 0.289458 50% 0.321721 75% 0.360688 max 0.600000 Name: year2022, dtype: float64
A visual look:
import seaborn as sea
sea.boxplot(covid19_vulnerables_Alarm_map.year2022, color='yellow',orient='h')
<Axes: xlabel='year2022'>
from sklearn.preprocessing import QuantileTransformer
qt = QuantileTransformer(n_quantiles=100, random_state=0,output_distribution='normal')
qt_result=qt.fit_transform(covid19_vulnerables_Alarm_map[['year2022']])
sea.boxplot(qt_result, color='yellow',orient='h')
<Axes: >
covid19_vulnerables_Alarm_map['year_2022_qt']=qt_result
!pip install libpysal
Collecting libpysal Downloading libpysal-4.12.1-py3-none-any.whl.metadata (4.8 kB) Requirement already satisfied: beautifulsoup4>=4.10 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (4.12.3) Requirement already satisfied: geopandas>=0.10.0 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (0.14.2) Requirement already satisfied: numpy>=1.22 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (1.26.4) Requirement already satisfied: packaging>=22 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (23.2) Requirement already satisfied: pandas>=1.4 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (2.2.2) Requirement already satisfied: platformdirs>=2.0.2 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (3.10.0) Requirement already satisfied: requests>=2.27 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (2.32.2) Requirement already satisfied: scipy>=1.8 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (1.13.1) Requirement already satisfied: shapely>=2.0.1 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (2.0.5) Requirement already satisfied: scikit-learn>=1.1 in c:\users\luis\anaconda3\lib\site-packages (from libpysal) (1.4.2) Requirement already satisfied: soupsieve>1.2 in c:\users\luis\anaconda3\lib\site-packages (from beautifulsoup4>=4.10->libpysal) (2.5) Requirement already satisfied: fiona>=1.8.21 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.10.0->libpysal) (1.9.5) Requirement already satisfied: pyproj>=3.3.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.10.0->libpysal) (3.6.1) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4->libpysal) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4->libpysal) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4->libpysal) (2023.3) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal) (2.2.2) Requirement already satisfied: certifi>=2017.4.17 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal) (2024.8.30) Requirement already satisfied: joblib>=1.2.0 in c:\users\luis\anaconda3\lib\site-packages (from scikit-learn>=1.1->libpysal) (1.4.2) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\luis\anaconda3\lib\site-packages (from scikit-learn>=1.1->libpysal) (2.2.0) Requirement already satisfied: attrs>=19.2.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.10.0->libpysal) (23.1.0) Requirement already satisfied: click~=8.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.10.0->libpysal) (8.1.7) Requirement already satisfied: click-plugins>=1.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.10.0->libpysal) (1.1.1) Requirement already satisfied: cligj>=0.5 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.10.0->libpysal) (0.7.2) Requirement already satisfied: six in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.10.0->libpysal) (1.16.0) Requirement already satisfied: setuptools in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.10.0->libpysal) (69.5.1) Requirement already satisfied: colorama in c:\users\luis\anaconda3\lib\site-packages (from click~=8.0->fiona>=1.8.21->geopandas>=0.10.0->libpysal) (0.4.6) Downloading libpysal-4.12.1-py3-none-any.whl (2.8 MB) ---------------------------------------- 0.0/2.8 MB ? eta -:--:-- ---------------------------------------- 0.0/2.8 MB ? eta -:--:-- ---------------------------------------- 0.0/2.8 MB ? eta -:--:-- ---------------------------------------- 0.0/2.8 MB ? eta -:--:-- - -------------------------------------- 0.1/2.8 MB 573.4 kB/s eta 0:00:05 ------- -------------------------------- 0.5/2.8 MB 2.7 MB/s eta 0:00:01 -------------- ------------------------- 1.0/2.8 MB 4.3 MB/s eta 0:00:01 ---------------------- ----------------- 1.6/2.8 MB 5.9 MB/s eta 0:00:01 ------------------------------ --------- 2.2/2.8 MB 6.9 MB/s eta 0:00:01 --------------------------------------- 2.8/2.8 MB 7.8 MB/s eta 0:00:01 --------------------------------------- 2.8/2.8 MB 7.8 MB/s eta 0:00:01 ---------------------------------------- 2.8/2.8 MB 6.4 MB/s eta 0:00:00 Installing collected packages: libpysal Successfully installed libpysal-4.12.1
from libpysal.weights import Queen, Rook, KNN
# rook
w_rook = Rook.from_dataframe(covid19_vulnerables_Alarm_map,use_index=False)
# rook
w_queen = Queen.from_dataframe(covid19_vulnerables_Alarm_map,use_index=False)
# k nearest neighbors
w_knn = KNN.from_dataframe(covid19_vulnerables_Alarm_map, k=8)
Let's understand the differences:
# first one
covid19_vulnerables_Alarm_map.head(1)
| OBJECTID | DEPARTAMEN | PROVINCIA_x | geometry | location | PROVINCIA_y | year2020 | year2021 | year2022 | year2023 | year2024 | flag | year_2022_qt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | AMAZONAS | CHACHAPOYAS | POLYGON ((-77.72614 -5.94354, -77.72486 -5.943... | AMAZONAS+CHACHAPOYAS | CHACHAPOYAS | 0.273486 | 0.321394 | 0.268201 | 0.417476 | 0.44086 | both | -0.932398 |
# amount neighbors of that district
w_rook.neighbors[0]
[2, 63, 4, 5, 139, 205, 207]
# see
base=covid19_vulnerables_Alarm_map[covid19_vulnerables_Alarm_map.PROVINCIA_x=="CHACHAPOYAS"].plot()
covid19_vulnerables_Alarm_map.iloc[w_rook.neighbors[0] ,].plot(ax=base,facecolor="yellow",edgecolor='k')
covid19_vulnerables_Alarm_map.head(1).plot(ax=base,facecolor="red")
<Axes: >
Let's do the same:
w_queen.neighbors[0]
[2, 63, 4, 5, 139, 205, 207]
base=covid19_vulnerables_Alarm_map[covid19_vulnerables_Alarm_map.PROVINCIA_x=="CHACHAPOYAS"].plot()
covid19_vulnerables_Alarm_map.iloc[w_queen.neighbors[0] ,].plot(ax=base,facecolor="yellow",edgecolor='k')
covid19_vulnerables_Alarm_map.head(1).plot(ax=base,facecolor="red")
<Axes: >
w_knn.neighbors[0]
[5, 4, 63, 207, 2, 67, 200, 139]
base=covid19_vulnerables_Alarm_map[covid19_vulnerables_Alarm_map.PROVINCIA_x=="CHACHAPOYAS"].plot()
covid19_vulnerables_Alarm_map.iloc[w_knn.neighbors[0] ,].plot(ax=base,facecolor="yellow",edgecolor='k')
covid19_vulnerables_Alarm_map.head(1).plot(ax=base,facecolor="red")
<Axes: >
Let me pay attention to the queen results:
# all the neighbors by row
w_queen.neighbors
{0: [2, 63, 4, 5, 139, 205, 207],
1: [3, 68, 69, 6],
2: [0, 207, 3, 4, 6, 168, 200],
3: [168, 1, 2, 6],
4: [0, 64, 2, 6, 63],
5: [0, 200, 203, 205, 207],
6: [64, 1, 2, 3, 68, 4, 66],
7: [34, 20, 22, 8, 24, 25, 31],
8: [25, 31, 7],
9: [98, 10, 11, 12, 13, 14, 15, 16, 17, 21, 24],
10: [98, 9, 11, 12, 13, 14, 15, 16, 17, 21, 24],
11: [98, 9, 10, 12, 13, 14, 15, 16, 17, 21, 24],
12: [98, 9, 10, 11, 13, 14, 15, 16, 17, 21, 24],
13: [98, 9, 10, 11, 12, 14, 15, 16, 17, 21, 24],
14: [98, 9, 10, 11, 12, 13, 15, 16, 17, 21, 24],
15: [98, 9, 10, 11, 12, 13, 14, 16, 17, 21, 24],
16: [98, 9, 10, 11, 12, 13, 14, 15, 17, 21, 24],
17: [98, 9, 10, 11, 12, 13, 14, 15, 16, 21, 24],
18: [24, 34, 20, 21],
19: [97, 99, 104, 24, 153, 154, 28, 25, 31],
20: [24, 34, 18, 7],
21: [98, 34, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 24, 27],
22: [32, 25, 34, 7],
23: [32, 33, 26, 29],
24: [7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 31, 98, 99],
25: [19, 22, 7, 8, 153, 31],
26: [32, 33, 34, 23, 30],
27: [34, 98, 21, 101, 30],
28: [153, 154, 19, 159],
29: [32, 33, 144, 146, 148, 23],
30: [144, 33, 34, 101, 26, 27],
31: [19, 7, 24, 8, 25],
32: [34, 146, 148, 22, 23, 26, 29],
33: [144, 23, 26, 29, 30],
34: [32, 7, 18, 20, 21, 22, 26, 27, 30],
35: [83, 36, 37, 38, 39, 41, 77],
36: [35, 38, 40, 83, 54, 55, 56, 58, 60],
37: [81, 49, 35, 38, 39, 56, 41],
38: [56, 35, 36, 37],
39: [81, 35, 84, 37, 41, 77],
40: [50, 36, 54, 60],
41: [35, 37, 39],
42: [48, 197, 46, 43, 174],
43: [48, 42, 44, 45, 46, 47],
44: [55,
56,
57,
43,
108,
109,
110,
47,
111,
49,
112,
113,
114,
115,
116,
119,
120,
121,
122,
123,
117,
125,
118,
124],
45: [43, 46, 47],
46: [193, 197, 42, 43, 45, 47, 81, 82],
47: [49, 81, 82, 43, 44, 45, 46],
48: [42, 43, 173, 174, 175],
49: [81, 37, 56, 57, 44, 47],
50: [51, 53, 54, 40, 90, 60, 93],
51: [50, 59, 60, 93],
52: [59, 93, 55],
53: [133, 50, 83, 54, 89, 90, 92, 94],
54: [50, 83, 36, 53, 40],
55: [119,
56,
93,
36,
58,
59,
44,
108,
109,
111,
110,
113,
114,
115,
52,
116,
117,
112,
120,
121,
122,
123,
118,
125,
126,
124],
56: [49, 36, 37, 38, 55, 57, 44],
57: [56, 49, 44],
58: [59, 36, 60, 55],
59: [51, 52, 55, 58, 60, 93],
60: [50, 51, 36, 40, 58, 59],
61: [65, 67, 147, 70, 72, 62, 63],
62: [145, 147, 70, 139, 61, 142],
63: [0, 64, 67, 4, 70, 139, 61],
64: [66, 67, 4, 6, 73, 149, 150, 63],
65: [147, 71, 72, 138, 140, 61, 143],
66: [64, 68, 150, 6],
67: [64, 71, 72, 73, 61, 63],
68: [1, 66, 69, 6, 181, 150, 151],
69: [1, 68, 181],
70: [139, 61, 62, 63],
71: [65, 67, 149, 72, 73, 140],
72: [65, 67, 61, 71],
73: [64, 67, 149, 71],
74: [152],
75: [84, 86, 87, 77, 78],
76: [80, 81, 84, 86, 79],
77: [35, 83, 84, 39, 87, 75],
78: [83, 85, 86, 87, 75, 171],
79: [80, 81, 194, 82, 76],
80: [194, 86, 76, 189, 79],
81: [37, 39, 76, 46, 47, 79, 49, 82, 84],
82: [193, 194, 81, 79, 46, 47],
83: [35, 36, 133, 171, 77, 78, 53, 54, 87, 218],
84: [81, 86, 39, 75, 76, 77],
85: [78, 171, 86],
86: [75, 76, 171, 78, 80, 84, 85, 189],
87: [75, 83, 77, 78],
88: [128, 161, 89, 90, 91, 92, 93, 94],
89: [88, 90, 92, 53],
90: [50, 53, 88, 89, 93],
91: [161, 88, 107, 93, 127],
92: [88, 89, 53, 94],
93: [59, 106, 50, 51, 52, 55, 88, 90, 91, 126, 127],
94: [128, 53, 133, 88, 92],
95: [96, 97, 100, 102, 104, 105],
96: [176, 177, 102, 104, 95],
97: [99, 100, 19, 104, 105, 95],
98: [99, 100, 101, 9, 10, 11, 12, 13, 14, 15, 16, 17, 21, 24, 27],
99: [97, 98, 19, 100, 24],
100: [97, 98, 99, 101, 102, 167, 103, 201, 209, 219, 95],
101: [144, 209, 98, 100, 27, 30],
102: [96, 176, 178, 100, 103, 95],
103: [178, 100, 102, 217, 219],
104: [96, 97, 160, 105, 177, 19, 154, 95],
105: [104, 97, 95],
106: [93,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
119,
120,
121,
122,
123,
124,
125,
126,
127],
107: [161, 91, 156, 127],
108: [119,
106,
44,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
109: [119,
106,
44,
108,
110,
111,
112,
113,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
110: [119,
106,
44,
108,
109,
111,
112,
113,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
111: [119,
106,
44,
108,
109,
110,
112,
113,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
112: [119,
106,
44,
108,
109,
110,
111,
113,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
113: [119,
106,
44,
108,
109,
110,
111,
112,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
114: [119,
106,
44,
108,
109,
110,
111,
112,
113,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
115: [119,
106,
44,
108,
109,
110,
111,
112,
113,
114,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
116: [119,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
117,
118,
55,
120,
121,
122,
123,
124,
125,
126],
117: [119,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
118,
55,
120,
121,
122,
123,
124,
125,
126],
118: [119,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
120,
121,
122,
123,
124,
125,
126],
119: [120,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
121,
122,
123,
124,
125,
126],
120: [119,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
121,
122,
123,
124,
125,
126],
121: [119,
120,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
122,
123,
124,
125,
126],
122: [119,
120,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
121,
123,
124,
125,
126],
123: [119,
120,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
121,
122,
124,
125,
126],
124: [119,
120,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
121,
122,
123,
125,
126],
125: [119,
120,
106,
44,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
55,
118,
121,
122,
123,
124,
126],
126: [119,
93,
106,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
55,
120,
121,
122,
123,
124,
125],
127: [91, 106, 107, 93],
128: [161, 129, 133, 136, 88, 94],
129: [128, 161, 131, 133, 136],
130: [178, 131, 132, 133, 134],
131: [129, 130, 161, 133, 134, 135, 158],
132: [176, 178, 130, 134, 135],
133: [128, 129, 130, 131, 178, 83, 53, 218, 94],
134: [130, 131, 132, 135],
135: [176, 131, 132, 134, 155, 157, 158],
136: [128, 161, 129],
137: [138, 148, 141, 142],
138: [65, 147, 137, 142, 143],
139: [0, 144, 145, 70, 205, 62, 63],
140: [143, 65, 149, 71],
141: [137, 146, 148, 142],
142: [145, 146, 147, 137, 138, 141, 62],
143: [65, 138, 140],
144: [33, 101, 139, 205, 209, 146, 145, 29, 30],
145: [144, 146, 142, 139, 62],
146: [32, 144, 145, 29, 148, 141, 142],
147: [65, 138, 142, 61, 62],
148: [32, 146, 141, 137, 29],
149: [64, 150, 71, 151, 73, 140],
150: [64, 66, 68, 149, 151],
151: [179, 68, 181, 150, 149, 182, 186],
152: [74, 155, 156, 157, 158],
153: [25, 19, 28, 159],
154: [160, 19, 104, 28, 159],
155: [152, 157, 158, 135],
156: [152, 161, 107, 158],
157: [176, 135, 152, 155, 159],
158: [161, 131, 135, 152, 155, 156],
159: [160, 176, 153, 154, 28, 157],
160: [176, 177, 104, 154, 159],
161: [128, 129, 131, 136, 107, 88, 91, 156, 158],
162: [169, 164, 165, 166],
163: [208, 164, 166, 200, 204, 168],
164: [168, 162, 163, 166],
165: [169, 162, 166],
166: [208, 162, 163, 164, 165, 167, 217],
167: [208, 100, 217, 166, 201, 219, 206],
168: [2, 3, 163, 164, 200],
169: [162, 165],
170: [198, 218, 171, 172, 189],
171: [83, 85, 86, 218, 170, 189, 78],
172: [170, 218, 220],
173: [48, 191, 211, 212, 187, 174, 175],
174: [48, 197, 42, 187, 173],
175: [48, 212, 173],
176: [96, 160, 132, 102, 135, 177, 178, 157, 159],
177: [96, 104, 176, 160],
178: [130, 132, 133, 102, 103, 176, 217, 218],
179: [180, 182, 151, 184, 183, 186],
180: [184, 179, 181, 182],
181: [68, 69, 182, 151, 180],
182: [179, 180, 181, 151],
183: [184, 185, 186, 179],
184: [179, 180, 214, 183, 185, 215],
185: [184, 215, 183],
186: [151, 179, 183],
187: [192, 197, 173, 174, 191],
188: [192, 193, 194, 196, 197, 189],
189: [194, 196, 198, 170, 171, 80, 86, 188],
190: [199, 191],
191: [211, 213, 187, 173, 190],
192: [195, 196, 197, 187, 188],
193: [194, 82, 197, 188, 46],
194: [80, 193, 82, 188, 189, 79],
195: [192],
196: [192, 188, 189, 198],
197: [192, 193, 188, 42, 187, 174, 46],
198: [170, 196, 189],
199: [190],
200: [2, 163, 5, 168, 202, 203, 204, 207],
201: [209, 100, 167, 202, 203, 205, 206],
202: [200, 201, 203, 204, 206],
203: [5, 200, 201, 202, 205],
204: [208, 163, 200, 202, 206],
205: [0, 144, 209, 5, 201, 139, 203],
206: [208, 167, 201, 202, 204],
207: [0, 2, 5, 200],
208: [163, 166, 167, 204, 206],
209: [144, 100, 101, 201, 205],
210: [211, 212, 213],
211: [210, 212, 213, 173, 191],
212: [210, 211, 213, 173, 175],
213: [210, 211, 212, 191],
214: [184, 216, 215],
215: [184, 185, 214],
216: [214],
217: [178, 166, 103, 167, 218, 219],
218: [133, 170, 171, 172, 178, 83, 217, 220],
219: [167, 100, 217, 103],
220: [218, 172]}
# the matrix of neighboorhood:
pd.DataFrame(*w_queen.full()).astype(int) # 1 means both are neighbors
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 216 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 217 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 218 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| 219 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 220 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
221 rows Γ 221 columns
# pct of neighboorhood (density)
w_queen.pct_nonzero
3.3005057226510512
# a province with NO neighbor?
w_queen.islands
[]
Moran's correlationΒΆ
We need the neighboorhood matrix (the weight matrix) to compute spatial correlation: if the variable value is correlated with the values of its neighbors - which proves a spatial effect.
# needed for spatial correlation
w_queen.transform = 'R'
pd.DataFrame(*w_queen.full()).sum(axis=1) # 1 means both are neighbors
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
...
216 1.0
217 1.0
218 1.0
219 1.0
220 1.0
Length: 221, dtype: float64
Spatial correlation is measured by the Moran's I statistic:
!pip install esda
Collecting esda Downloading esda-2.6.0-py3-none-any.whl.metadata (2.0 kB) Requirement already satisfied: geopandas>=0.12 in c:\users\luis\anaconda3\lib\site-packages (from esda) (0.14.2) Requirement already satisfied: libpysal>=4.12 in c:\users\luis\anaconda3\lib\site-packages (from esda) (4.12.1) Requirement already satisfied: numpy>=1.24 in c:\users\luis\anaconda3\lib\site-packages (from esda) (1.26.4) Requirement already satisfied: pandas>1.5 in c:\users\luis\anaconda3\lib\site-packages (from esda) (2.2.2) Requirement already satisfied: scikit-learn>=1.2 in c:\users\luis\anaconda3\lib\site-packages (from esda) (1.4.2) Requirement already satisfied: scipy>=1.9 in c:\users\luis\anaconda3\lib\site-packages (from esda) (1.13.1) Requirement already satisfied: shapely>=2.0 in c:\users\luis\anaconda3\lib\site-packages (from esda) (2.0.5) Requirement already satisfied: fiona>=1.8.21 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.12->esda) (1.9.5) Requirement already satisfied: packaging in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.12->esda) (23.2) Requirement already satisfied: pyproj>=3.3.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.12->esda) (3.6.1) Requirement already satisfied: beautifulsoup4>=4.10 in c:\users\luis\anaconda3\lib\site-packages (from libpysal>=4.12->esda) (4.12.3) Requirement already satisfied: platformdirs>=2.0.2 in c:\users\luis\anaconda3\lib\site-packages (from libpysal>=4.12->esda) (3.10.0) Requirement already satisfied: requests>=2.27 in c:\users\luis\anaconda3\lib\site-packages (from libpysal>=4.12->esda) (2.32.2) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\luis\anaconda3\lib\site-packages (from pandas>1.5->esda) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\luis\anaconda3\lib\site-packages (from pandas>1.5->esda) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\luis\anaconda3\lib\site-packages (from pandas>1.5->esda) (2023.3) Requirement already satisfied: joblib>=1.2.0 in c:\users\luis\anaconda3\lib\site-packages (from scikit-learn>=1.2->esda) (1.4.2) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\luis\anaconda3\lib\site-packages (from scikit-learn>=1.2->esda) (2.2.0) Requirement already satisfied: soupsieve>1.2 in c:\users\luis\anaconda3\lib\site-packages (from beautifulsoup4>=4.10->libpysal>=4.12->esda) (2.5) Requirement already satisfied: attrs>=19.2.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (23.1.0) Requirement already satisfied: certifi in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (2024.8.30) Requirement already satisfied: click~=8.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (8.1.7) Requirement already satisfied: click-plugins>=1.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (1.1.1) Requirement already satisfied: cligj>=0.5 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (0.7.2) Requirement already satisfied: six in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (1.16.0) Requirement already satisfied: setuptools in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.12->esda) (69.5.1) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal>=4.12->esda) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal>=4.12->esda) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal>=4.12->esda) (2.2.2) Requirement already satisfied: colorama in c:\users\luis\anaconda3\lib\site-packages (from click~=8.0->fiona>=1.8.21->geopandas>=0.12->esda) (0.4.6) Downloading esda-2.6.0-py3-none-any.whl (135 kB) ---------------------------------------- 0.0/135.4 kB ? eta -:--:-- ---------------------------------------- 0.0/135.4 kB ? eta -:--:-- --- ------------------------------------ 10.2/135.4 kB ? eta -:--:-- --- ------------------------------------ 10.2/135.4 kB ? eta -:--:-- ----------- --------------------------- 41.0/135.4 kB 281.8 kB/s eta 0:00:01 -------------------------- ------------ 92.2/135.4 kB 585.1 kB/s eta 0:00:01 -------------------------------------- 135.4/135.4 kB 727.5 kB/s eta 0:00:00 Installing collected packages: esda Successfully installed esda-2.6.0
Spatial correlation is measured by the Moran's I statistic:
from esda.moran import Moran
morancovid19 = Moran(covid19_vulnerables_Alarm_map['year_2022_qt'], w_queen)
morancovid19.I,morancovid19.p_sim
(0.08667603252321662, 0.014)
The Moran's I is significant. Let's see:
!pip install splot
Collecting splot Downloading splot-1.1.7-py3-none-any.whl.metadata (8.9 kB) Requirement already satisfied: esda in c:\users\luis\anaconda3\lib\site-packages (from splot) (2.6.0) Requirement already satisfied: geopandas>=0.9.0 in c:\users\luis\anaconda3\lib\site-packages (from splot) (0.14.2) Collecting giddy (from splot) Downloading giddy-2.3.5-py3-none-any.whl.metadata (6.4 kB) Requirement already satisfied: libpysal in c:\users\luis\anaconda3\lib\site-packages (from splot) (4.12.1) Requirement already satisfied: mapclassify in c:\users\luis\anaconda3\lib\site-packages (from splot) (2.5.0) Requirement already satisfied: matplotlib>=3.3.3 in c:\users\luis\anaconda3\lib\site-packages (from splot) (3.9.2) Requirement already satisfied: numpy in c:\users\luis\anaconda3\lib\site-packages (from splot) (1.26.4) Requirement already satisfied: packaging in c:\users\luis\anaconda3\lib\site-packages (from splot) (23.2) Requirement already satisfied: seaborn>=0.11.0 in c:\users\luis\anaconda3\lib\site-packages (from splot) (0.13.2) Collecting spreg (from splot) Downloading spreg-1.7-py3-none-any.whl.metadata (1.7 kB) Requirement already satisfied: fiona>=1.8.21 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.9.0->splot) (1.9.5) Requirement already satisfied: pandas>=1.4.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.9.0->splot) (2.2.2) Requirement already satisfied: pyproj>=3.3.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.9.0->splot) (3.6.1) Requirement already satisfied: shapely>=1.8.0 in c:\users\luis\anaconda3\lib\site-packages (from geopandas>=0.9.0->splot) (2.0.5) Requirement already satisfied: contourpy>=1.0.1 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (1.2.0) Requirement already satisfied: cycler>=0.10 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (4.51.0) Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (1.4.4) Requirement already satisfied: pillow>=8 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (10.3.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in c:\users\luis\anaconda3\lib\site-packages (from matplotlib>=3.3.3->splot) (2.9.0.post0) Requirement already satisfied: scikit-learn>=1.2 in c:\users\luis\anaconda3\lib\site-packages (from esda->splot) (1.4.2) Requirement already satisfied: scipy>=1.9 in c:\users\luis\anaconda3\lib\site-packages (from esda->splot) (1.13.1) Requirement already satisfied: beautifulsoup4>=4.10 in c:\users\luis\anaconda3\lib\site-packages (from libpysal->splot) (4.12.3) Requirement already satisfied: platformdirs>=2.0.2 in c:\users\luis\anaconda3\lib\site-packages (from libpysal->splot) (3.10.0) Requirement already satisfied: requests>=2.27 in c:\users\luis\anaconda3\lib\site-packages (from libpysal->splot) (2.32.2) Collecting quantecon>=0.4.7 (from giddy->splot) Downloading quantecon-0.7.2-py3-none-any.whl.metadata (4.9 kB) Requirement already satisfied: networkx in c:\users\luis\anaconda3\lib\site-packages (from mapclassify->splot) (3.2.1) Requirement already satisfied: soupsieve>1.2 in c:\users\luis\anaconda3\lib\site-packages (from beautifulsoup4>=4.10->libpysal->splot) (2.5) Requirement already satisfied: attrs>=19.2.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (23.1.0) Requirement already satisfied: certifi in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (2024.8.30) Requirement already satisfied: click~=8.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (8.1.7) Requirement already satisfied: click-plugins>=1.0 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (1.1.1) Requirement already satisfied: cligj>=0.5 in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (0.7.2) Requirement already satisfied: six in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (1.16.0) Requirement already satisfied: setuptools in c:\users\luis\anaconda3\lib\site-packages (from fiona>=1.8.21->geopandas>=0.9.0->splot) (69.5.1) Requirement already satisfied: pytz>=2020.1 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas>=0.9.0->splot) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\luis\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas>=0.9.0->splot) (2023.3) Requirement already satisfied: numba>=0.49.0 in c:\users\luis\anaconda3\lib\site-packages (from quantecon>=0.4.7->giddy->splot) (0.59.1) Requirement already satisfied: sympy in c:\users\luis\anaconda3\lib\site-packages (from quantecon>=0.4.7->giddy->splot) (1.12) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal->splot) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal->splot) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\luis\anaconda3\lib\site-packages (from requests>=2.27->libpysal->splot) (2.2.2) Requirement already satisfied: joblib>=1.2.0 in c:\users\luis\anaconda3\lib\site-packages (from scikit-learn>=1.2->esda->splot) (1.4.2) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\luis\anaconda3\lib\site-packages (from scikit-learn>=1.2->esda->splot) (2.2.0) Requirement already satisfied: colorama in c:\users\luis\anaconda3\lib\site-packages (from click~=8.0->fiona>=1.8.21->geopandas>=0.9.0->splot) (0.4.6) Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in c:\users\luis\anaconda3\lib\site-packages (from numba>=0.49.0->quantecon>=0.4.7->giddy->splot) (0.42.0) Requirement already satisfied: mpmath>=0.19 in c:\users\luis\anaconda3\lib\site-packages (from sympy->quantecon>=0.4.7->giddy->splot) (1.3.0) Downloading splot-1.1.7-py3-none-any.whl (39 kB) Downloading giddy-2.3.5-py3-none-any.whl (61 kB) ---------------------------------------- 0.0/61.1 kB ? eta -:--:-- -------------------------- ------------- 41.0/61.1 kB 2.0 MB/s eta 0:00:01 --------------------------------- ------ 51.2/61.1 kB 660.6 kB/s eta 0:00:01 ---------------------------------------- 61.1/61.1 kB 546.7 kB/s eta 0:00:00 Downloading spreg-1.7-py3-none-any.whl (372 kB) ---------------------------------------- 0.0/372.8 kB ? eta -:--:-- ------------------------------ --------- 286.7/372.8 kB 8.9 MB/s eta 0:00:01 ---------------------------------------- 372.8/372.8 kB 5.8 MB/s eta 0:00:00 Downloading quantecon-0.7.2-py3-none-any.whl (215 kB) ---------------------------------------- 0.0/215.4 kB ? eta -:--:-- ---------------------------------------- 215.4/215.4 kB 6.4 MB/s eta 0:00:00 Installing collected packages: quantecon, spreg, giddy, splot Successfully installed giddy-2.3.5 quantecon-0.7.2 splot-1.1.7 spreg-1.7
from splot.esda import moran_scatterplot
import matplotlib.pyplot as plt
fig, ax = moran_scatterplot(morancovid19)
ax.set_xlabel('Covid19_alarma_share')
ax.set_ylabel('SpatialLag_Covid19_alarma_share')
Text(0, 0.5, 'SpatialLag_Covid19_alarma_share')
Local Spatial CorrelationΒΆ
We can compute a LISA (local Moran) for each case. That will help us find spatial clusters (spots) and spatial outliers:
A hotSpot is a polygon whose value in the variable is high AND is surrounded with polygons with also high values.
A coldSpot is a polygon whose value in the variable is low AND is surrounded with polygons with also low values.
A coldOutlier is a polygon whose value in the variable is low BUT is surrounded with polygons with high values.
A hotOutlier is a polygon whose value in the variable is high BUT is surrounded with polygons with low values.
It is also possible that no significant correlation is detected. Let's see those values:
# The scatterplot with local info
from esda.moran import Moran_Local
# calculate Moran_Local and plot
lisacovid19 = Moran_Local(y=covid19_vulnerables_Alarm_map['year_2022_qt'], w=w_knn,seed=2022)
fig, ax = moran_scatterplot(lisacovid19,p=0.05)
ax.set_xlabel('Covid19_alarma_share')
ax.set_ylabel('SpatialLag_Covid19_alarma_share');
from splot.esda import plot_local_autocorrelation
plot_local_autocorrelation(lisacovid19, covid19_vulnerables_Alarm_map,'year_2022_qt')
plt.show()
Let me add that data to my gdf:
# quadrant
lisacovid19.q
array([2, 4, 3, 3, 3, 4, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 4, 1, 1, 4,
1, 4, 1, 2, 1, 3, 1, 4, 4, 1, 2, 4, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1,
3, 1, 1, 1, 4, 1, 2, 1, 1, 2, 2, 4, 1, 1, 1, 2, 1, 2, 4, 4, 4, 1,
3, 2, 3, 2, 2, 2, 1, 4, 3, 3, 2, 4, 4, 4, 4, 4, 2, 3, 3, 4, 3, 3,
2, 1, 2, 4, 4, 2, 1, 3, 4, 4, 1, 3, 4, 3, 3, 3, 3, 2, 3, 3, 4, 3,
3, 4, 3, 3, 4, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 2, 1, 4, 1,
4, 2, 4, 2, 1, 1, 1, 4, 1, 3, 1, 1, 3, 3, 3, 1, 3, 2, 2, 4, 3, 1,
1, 4, 2, 3, 4, 2, 2, 4, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 2, 1, 4,
3, 3, 3, 2, 4, 4, 4, 2, 4, 2, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2,
1, 1, 3, 4, 3, 3, 4, 4, 2, 4, 3, 3, 2, 4, 2, 1, 2, 4, 4, 4, 4, 3,
3])
# significance
lisacovid19.p_sim
array([0.218, 0.061, 0.464, 0.216, 0.446, 0.27 , 0.314, 0.042, 0.099,
0.454, 0.28 , 0.378, 0.368, 0.181, 0.287, 0.271, 0.122, 0.135,
0.395, 0.115, 0.202, 0.355, 0.14 , 0.467, 0.458, 0.022, 0.128,
0.451, 0.075, 0.094, 0.404, 0.029, 0.366, 0.263, 0.255, 0.366,
0.221, 0.235, 0.074, 0.4 , 0.237, 0.199, 0.252, 0.117, 0.428,
0.272, 0.15 , 0.079, 0.449, 0.023, 0.293, 0.239, 0.068, 0.264,
0.104, 0.161, 0.009, 0.013, 0.095, 0.024, 0.201, 0.205, 0.145,
0.316, 0.404, 0.324, 0.346, 0.452, 0.422, 0.435, 0.318, 0.22 ,
0.139, 0.384, 0.08 , 0.364, 0.396, 0.186, 0.229, 0.304, 0.265,
0.359, 0.174, 0.319, 0.47 , 0.338, 0.388, 0.463, 0.075, 0.237,
0.256, 0.487, 0.419, 0.295, 0.377, 0.179, 0.023, 0.035, 0.446,
0.179, 0.01 , 0.267, 0.044, 0.417, 0.491, 0.496, 0.033, 0.295,
0.021, 0.012, 0.021, 0.023, 0.023, 0.025, 0.01 , 0.029, 0.023,
0.006, 0.023, 0.022, 0.022, 0.03 , 0.022, 0.022, 0.022, 0.024,
0.015, 0.367, 0.044, 0.275, 0.486, 0.222, 0.412, 0.115, 0.441,
0.352, 0.036, 0.447, 0.086, 0.168, 0.183, 0.24 , 0.375, 0.086,
0.1 , 0.184, 0.223, 0.328, 0.379, 0.164, 0.473, 0.37 , 0.27 ,
0.037, 0.377, 0.241, 0.474, 0.478, 0.395, 0.234, 0.407, 0.286,
0.024, 0.068, 0.037, 0.136, 0.078, 0.097, 0.11 , 0.034, 0.446,
0.301, 0.046, 0.424, 0.188, 0.353, 0.321, 0.233, 0.477, 0.459,
0.429, 0.258, 0.178, 0.481, 0.494, 0.444, 0.492, 0.001, 0.017,
0.011, 0.002, 0.06 , 0.009, 0.138, 0.286, 0.003, 0.017, 0.001,
0.062, 0.003, 0.375, 0.001, 0.433, 0.329, 0.101, 0.188, 0.241,
0.219, 0.265, 0.009, 0.138, 0.494, 0.135, 0.209, 0.388, 0.474,
0.477, 0.343, 0.064, 0.013, 0.235])
# quadrant: 1 HH, 2 LH, 3 LL, 4 HL
pd.Series(lisacovid19.q).value_counts()
3 63 1 57 4 55 2 46 Name: count, dtype: int64
The info in lisacovid19.q can not be used right away, we need to add if the local spatial correlation is significant:
covid19_vulnerables_Alarm_map['Covid19_quadrant']=[l if p <0.05 else 0 for l,p in zip(lisacovid19.q,lisacovid19.p_sim) ]
covid19_vulnerables_Alarm_map['Covid19_quadrant'].value_counts()
Covid19_quadrant 0 171 3 20 1 12 4 11 2 7 Name: count, dtype: int64
Now, we recode:
labels = [ '0 no_sig', '1 hotSpot', '2 coldOutlier', '3 coldSpot', '4 hotOutlier']
covid19_vulnerables_Alarm_map['Covid19_quadrant_names']=[labels[i] for i in covid19_vulnerables_Alarm_map['Covid19_quadrant']]
covid19_vulnerables_Alarm_map['Covid19_quadrant_names'].value_counts()
Covid19_quadrant_names 0 no_sig 171 3 coldSpot 20 1 hotSpot 12 4 hotOutlier 11 2 coldOutlier 7 Name: count, dtype: int64
Let's replot:
from matplotlib import colors
myColMap = colors.ListedColormap([ 'ghostwhite', 'red', 'green', 'black','orange'])
f, ax = plt.subplots(1, figsize=(12,12))
plt.title('Spots and Outliers')
covid19_vulnerables_Alarm_map.plot(column='Covid19_quadrant_names',
categorical=True,
cmap=myColMap,
linewidth=0.1,
edgecolor='white',
legend=True,
legend_kwds={'loc': 'center left',
'bbox_to_anchor': (0.7, 0.6)},
ax=ax)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
covid19_vulnerables_Alarm_map.explore("Covid19_quadrant_names", categorical=True,tooltip='location',cmap=myColMap)